Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wasm simd support #84

Merged
merged 1 commit into from
Dec 22, 2021
Merged

Add wasm simd support #84

merged 1 commit into from
Dec 22, 2021

Conversation

alexcrichton
Copy link
Contributor

This commit adds simd acceleration support to the memmem module. This
is added with the freshly-stabilized support from rust-lang/rust#86204.
This mostly just cribs off the generic simd support for 128-bit types
built for sse, copying bits and pieces of code here and there. Some
refactoring happened internally to help reduce duplication where
possible.

I ran some initial benchmarks with the memmem/krate/* regex and a
hacked up single-threaded version of criterion. Some initial
comparisons
using Wasmtime as a runtime do indeed show a lot
of improvements, but there are indeed some slowdowns as well.

@alexcrichton
Copy link
Contributor Author

I'll note that this is largely just a first stab, I think there's more simd acceleration in the memchr module but it looked much larger and less generic so it'll take some more time to do something like that. In any case I wanted to put this up for some initial feedback and whether this seems like the right direction to go in. Additionally if there's more benchmarking desired I'm happy to do so as well!

@alexcrichton
Copy link
Contributor Author

I hope it's ok to ping you again @BurntSushi, but if you're busy and/or would prefer to not land this, just let me know!

Copy link
Owner

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This overall looks great. And hopefully this makes it easier to add other arches too in the future.

Meta question: is it possible for wasmtime itself to just use this crate? (Or implement a better memmem itself?)

So I think my main concern here is unfortunately MSRV. I don't treat MSRV bumps as semver breaking changes, but I do generally like to stay conservative for widely used crates like this one. I imagine this bumps the MSRV, although, it looks like CI never ran...?

I think the MSRV thing either means waiting to land this (bummer) or adding version sniffing to build.rs that only enables this on a new enough version of Rust. I'd be willing to support that to unlock this change. (But no new deps please. Probably copy one of @dtolnay's build.rs files.)

And we should get CI running. Not sure why this PR didn't run.

return unsafe { Some(PrefilterFn::new(x86::sse::find)) };
#[cfg(all(not(miri), target_arch = "wasm32", memchr_runtime_simd))]
{
if true {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this if true for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this was to prevent a lint from firing about dead code after this statement, I'll leave a comment

src/memmem/prefilter/mod.rs Show resolved Hide resolved
@BurntSushi
Copy link
Owner

Also, with respect to benchmarks, those look great. The slowdowns are generally what I'd expect I think. At least in theory, it would be quite surprising to see improvements across the board. The benchmarks are designed to put pressure on implementation choices that trade off perf in some cases for others.

@alexcrichton
Copy link
Contributor Author

Ok I think I fixed CI in 0d8dfc0 (and I see now you've approved the CI runs as well, hurray me being a new contributor!)

is it possible for wasmtime itself to just use this crate?

I'm not quite sure what you mean by this, can you elaborate?

So I think my main concern here is unfortunately MSRV

No worries! I'm happy to fixup anything that needs working. From the CI file it looks like it's 1.41.1, and to confirm is the goal that basically this should work on wasm by default with 1.41.1?

@BurntSushi
Copy link
Owner

I'm not quite sure what you mean by this, can you elaborate?

Yeah it might be an ill-formed question as a result of me knowing almost nothing about wasm haha. I probably have the abstraction layers totally confused to the point of it being a category error. But the question was provoked by your benchmarks. Presumably you're comparing the implementation in this crate with something. So I guess, what is that something? Where does it live? If it lives inside a runtime written in Rust (for example), then it seems plausible that it could be replaced with the code in this crate. Or maybe not?

No worries! I'm happy to fixup anything that needs working. From the CI file it looks like it's 1.41.1, and to confirm is the goal that basically this should work on wasm by default with 1.41.1?

Yeah, because I think this did work on Rust 1.41.1 with wasm? Although, maybe it's okay to have a more liberal MSRV policy there if the ecosystem is moving faster in wasm-land. Since that's the path of least resistance, let's just go that route until/if someone complains. But I think we should at least wait until it hits stable probably?

If MSRV on x86 remains unchanged (which I suspect is the case) then that's great.

@alexcrichton
Copy link
Contributor Author

Oh, heh, right! I should explain where the numbers came from... But nah nothing to do with Wasmtime in the sense of code in Wasmtime itself. The benchmark numbers before/after were this crate compiled with/without simd support (e.g. -Ctarget-feature=+simd128 as a compilation flag or not). Wasmtime is interpreting the input as an opaque wasm blob and doesn't have a special memmem routine or anything like that. The result of this PR should be that if you compile a wasm binary with simd support (e.g. pass -Ctarget-feature=+simd128) then the binary will do simd things to get faster searching. Otherwise it'll continue to use the same fallback that exists today (and exhibit the "before" numbers in my gist above)

For MSRV, I'll leave that up to you. I don't think that this PR should affect any non-wasm targets (and if it does that's a bug I'm happy to fix). For wasm I don't think it actually affects MSRV unless you enable the simd feature, but the simd feature won't be stable until 1.54.0 (to be released tomorrow, what a coincidence I thought about this again today). If you'd like though I can implement support that forcibly disables simd support unless the rustc version is >= 1.54.0 on wasm.

I'm also find waiting until tomorrow to land if you'd prefer wasm simd support hit stable :)

@BurntSushi
Copy link
Owner

Ooooooooo, gotya. That makes more sense.

OK, so my plan here is:

  • Wait for 1.54 to come out (today).
  • Try this out for myself on my local machine, and in particular, run the benchmarks to get experience with it.
  • Run the normal x86 benchmarks just to check that there are no serious regressions on that side. (I don't expect there to be.)

And then I think it's good to go!

Also, I created #87 for tracking the port of memchr to a more "generic" implementation. That would enable adding a memchr impl for WASM by "just" adding another shim that calls the generic version with a WASM vector type.

@alexcrichton
Copy link
Contributor Author

For running the benchmarks in wasm it's a bit tricky because criterion out-of-the-box doesn't work on wasm32-wasi (what I was using for running in wasmtime). It uses rayon in a few places and its dependency, plotters, equates wasm32 with "use wasm-bindgen for the web".

I had two local patches to get things working:

basically I made things assume single-threaded and not use web things to get things running in wasmtime. FWIW you can probably do web things today since I think at least Chrome & Firefox are shipping wasm simd support.

Otherwise though my command line to execute the wasm benchmarks was:

../wasmtime/target/release/wasmtime --mapdir target::./target -- no-simd.wasm --bench 'memmem/krate/*' --warm-up-time 1 --save-baseline no-simd

(searching in my bash history). I think I ran cargo +nightly bench --no-run and then copied the binary to simd.wasm and no-simd.wasm. The --mapdir business was so criterion had a place to write all of its results.


Also, if you're busy, I'm happy to look into the refactorings of #87 to get wasm supported with memchr as well as memmem. I skimmed the various modules and it looked like a simple enough port to the Vector trait with perhaps a method or two added. If you've got time though and feel inspired to take this on I'm happy to defer!

One question I had though reviewing the code. You've got an sse42 module but I don't think it's listed in x86/mod.rs, so I don't think it's used? I'm not sure if that's intentional or whether it was in prototype phase and never got finished.

@alexcrichton
Copy link
Contributor Author

I was triggered into thinking about this again today, mind if I ping again @BurntSushi! I'm happy to help out with anything necessary to deal with this of course

Copy link
Owner

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All righty! Thanks for the ping. The thing that was holding me up here was that I wanted to run the benchmark suite locally just to be sure there weren't any regressions. So far, nothing seems to stick out to me beyond some variance. (I'll publish this run eventually.) So I think this is good to go with one small change (switch CI to stable).

Also, could you squash this all down into one commit? (Or I can do it.)

Thanks!

@@ -60,6 +63,10 @@ jobs:
- build: win-gnu
os: windows-2019
rust: stable-x86_64-gnu
- build: wasm
os: ubuntu-18.04
rust: beta-x86_64-gnu # waiting for wasm simd to hit stable
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be switched to stable now?

@alexcrichton
Copy link
Contributor Author

Sure thing, done now! (squashed, switched to stable, and updated Wasmtime as well)

Copy link
Owner

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

w00t!

This commit adds simd acceleration support to the `memmem` module. This
is added with the freshly-stabilized support from rust-lang/rust#86204.
This mostly just cribs off the generic simd support for 128-bit types
built for sse, copying bits and pieces of code here and there. Some
refactoring happened internally to help reduce duplication where
possible.

I ran some initial benchmarks with the `memmem/krate/*` regex and a
hacked up single-threaded version of criterion. Some [initial
comparisons][compare] using Wasmtime as a runtime do indeed show a lot
of improvements, but there are indeed some slowdowns as well.

[compare]: https://gist.github.com/alexcrichton/6a72e682e7b6d505ade605359fbe3f2d
@BurntSushi
Copy link
Owner

Urk. Sorry, I flubbed the benchmarks. Redoing them now. Once that's done, I'll put out a memchr 2.5.0 release.

@alexcrichton alexcrichton deleted the wasm branch December 22, 2021 19:45
@alexcrichton
Copy link
Contributor Author

Oh no rush! I had a branch somewhere at some point that did most of the "other half" of this crate in porting it to the same style as what's here. I can try to clean that up to post it here so wasm gets a benefit from all the algorithms here instead of just some of them

@BurntSushi
Copy link
Owner

Yeah that would be great! Thank you. :-)

BurntSushi added a commit that referenced this pull request Apr 30, 2022
This is a long overdue benchmark of the WASM implementation that landed
in #84. I tried to do this a few months ago, but I mucked it up and
never got back around to it. (This is also why I haven't put out a new
release with the WASM implementation, since I wanted to do a sanity
benchmark run first.)
BurntSushi added a commit that referenced this pull request Apr 30, 2022
This is a long overdue benchmark of the WASM implementation that landed
in #84. I tried to do this a few months ago, but I mucked it up and
never got back around to it. (This is also why I haven't put out a new
release with the WASM implementation, since I wanted to do a sanity
benchmark run first.)
@BurntSushi
Copy link
Owner

@alexcrichton So sorry about the delay here, but I've finally published this patch in memchr 2.5.0. I finally got around to running the benchmark suite both before and after this change (which takes 2.5 hours total! wow!) and it looks like there are no major differences.

@alexcrichton
Copy link
Contributor Author

No worries at all, and thanks again @BurntSushi!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants